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Abstract 

In this paper we study the computation of Markov bases for contingency tables 
whose cell entries have an upper bound. In general a Markov basis for unbounded 
contin gency table under a certain m odel differs from a Markov basis for bounded 
tables. iRapallo and Rogantinl (|20071 ) applied Lawrence lifting to compute a Markov 
basis for contingency tables whose cell entries are bounded. However, in the process, 
one has to compute the universal Grobner basis of the ideal associated with the 
design matrix for a model which is, in general, larger than any reduced Grobner 
basis. Thus, this is also infeasible in small- and medium-sized problems. In this 
paper we focus on bounded two-way contingency tables under independence model 
and show that if these bounds on cells are positive, i.e., they are not structural zeros, 
the set of basic moves of all 2 x 2 minors connects all tables with given margins. We 
end this paper with an open problem that if we know the given margins are positive, 
we want to find the necessary and sufficient condition on the set of structural zeros 
so that the set of basic moves of all 2 x 2 minors connects all incomplete contingency 
tables with given margins. 

keywords: Structural zeros Markov basis Universal Grobner basis 



1 Introduction 



The study of statistical models to detect comple x structures in contingency tables has 
received great attention in the last decades. (See lAgrestil (120021 ) for an overview of such 
models). Among the main research themes in this field, here we consider incomplete 
contingency tables (or equivalently, tables with structural zeros) and models to go beyond 
independence in two-way tables, such as quasi-independence models. 

Con tingency tabl e s with upper bounds on the cell counts have recently been considered 



in, e.g., ICryan et al.l (120051 ) . Bounded contingency tables can come, for instance, in the 



analys is of designed experiments with multinomia l response, a s in lAoki and Takemura 
(120061 ). and in logistic regression models, as in e.g. I Chen et al.l (120051 ) . We will use some 
examples from these applications later in the paper. 

In recent years, the use of algebraic and geometric techniques in statistics has produced 
at least two relevant advances. One is a better understanding of statistical models in terms 
of varieties and polynomial equations, through the notion of toric models, as described 
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in Chapter 6 of iPistone et al.l (]200lh . Moreover, algebraic statistics has introduced a 
non-asymptot ic method for goodness- o f-fit t ests following a Markov Chain Monte Carlo 



approach (see iDiaconis and Sturmfelsl (119981 )). Such an algorithm is based on the notion 



of Markov basis. In the last years the computation of Markov bases for special statistical 
models has involved both statisticians and algebraists. 

In this paper we consider the computation of Markov bases for bounded contingency 
tables. A general algorithm to compute Markov bases for this case was described in 
Rapallo and Rogantinl (120071 ). using the notions of Lawrence lifting and Universal Grobner 
basis of a polynomial ideal. When a Markov basis is computed through a Universal 
Grobner basis, we say that it is Universal Markov basis. The Markov bases for these kind 
of tables are in general very large, and we will show some explicit computations later in 
the paper. Therefore the computation of smaller Markov bases or subbases for special 
tables is a problem of major interest. 

In practice, computing the Markov basis for the bounded contingency tables is infeasi- 
ble because the number of elements in the Markov basis is very large. However, for some 
cases, if we know that the given margins are positive then the number of moves connecting 
all tables is smaller than the number of elements in a Markov basis for tables under the 



model. Such connecting sets were formalized in IChen et al.l ( 120061 1 with the terminology 
Markov subbases. In this paper we consider bounded I x J tables under independence 
model. These tables are equivalent to / x J x 2 tables under the models of no-3-way 
interaction. Using this fact and the result from lChen et al.l ( 120101 ). in this paper, we show 
that if we know the bounds of cells are all positive, that is, there are no structural zeros, 
then the set of basic moves of all 2 x 2 minors connects all bounded two-way contingency 
tables with given margins. 

To summarize, we classify the bounds of cells into the following patterns: 

(i) all cells are unbounded, 

(ii) all cells are bounded by positive integers, 

(iii) some cells are unbounded and the others are bounded by positive integers, 

(iv) some cells are unbounded and the others are structural zeros, 

(v) some cells are bounded by positive integers and the others are structural zeros, 

(vi) all types of bounds appear. 



Case (T) is the standard case, a lready studied in IDiaconis and Sturmfelsl (I1998I ). In the 
past. lAoki and Takemural (120051 ) dealt with the case (iv). In this paper Theorem [1] deals 
with the case (v), Theorem [3] deals with the case (ii), Section @] deals with the case (iii). 

The organization of this paper is as follows. In Section [2] we recall the basic facts 
about Markov bases and bounded contingency tables. In Section [3] we present a charac- 
terization of Universal Markov bases for incomplete tables, showing that there is a simple 
connection between the Universal Markov basis for an incomplete table and the corre- 
sponding complete table. We present some explicit examples, focusing in particular on 
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quasi-independence models for two-way tables. In Section H] we show how to compute 
Markov bases when the bounds involve only a subset of cell counts. In Section [5] we 
show our main theorem, that is, we consider bounded two-way contingency tables un- 
der independence model. If we know all bounds are positive (equivalently there are no 
structural zeros), then the set of basic moves of all 2 x 2 minors connects all bounded 
two-way contingency tables with given margins. We end this paper with an open problem 
for incomplete contingency tables with positive margins. 



2 Bounded contingency tables and Markov bases 

Let n be a contingency table with k cells. In order to simplify the notation, we denote 
by X = {l,...,k} the sample space of the contingency table. In the special case of 
two-way tables with / rows and J columns, we will also denote the sample space with 
* = {l,...,/}x{l,...,J}. 

Let N be the set of nonnegative integers, i.e., N = {0, 1, 2, . . .} and let Z be the set of 
all integers, i.e., Z = {. . . , —2, —1, 0, 1,2,.. .}. Without loss of generality, in this paper, 
we represent a table by a vector of counts n = (m, . . . , rik). Under this point of view, a 
contingency table n can be regarded as a function n : X — > N, but it can also be viewed 
as a vector n G N fc . 

The fiber of an observed table n b s with respect to a function T : N fc — > W is the set 

T T (n ohs ) = {n | n G N fe , T(n) = T(n obs )} . (1) 

When the dependence on the specific observed table is irrelevant, we will write simply Tt 
instead of J-r(n bs)- 

In mathematical statistics framework, the function T is usually the minimal sufficient 
statistic of some statistical model and the usefulness of enumeration of the fiber J - Vfnnh ; 



follows from classical theorems such as the Rao-Blackwell theorem, see e.g. IShaol ( 119981 ) . 

When the function T is linear, it can be extended in a natural way to an homomorphism 
from M n in R s , T is represented by an s x /c-matrix At, and its generic element At(£, h) 
is 

A T (£,h)=T l (h), (2) 

where Ti is the £-th component of the function T. In terms of the matrix At, the fiber 
Tt can be easily rewritten in the form: 

T T = {n | n G N fc , A T (n) = A T (n ohs )} . (3) 

To navigate inside the fiber Tt, i-e., to connect any two tables of the fiber Tt with a 
path of nonnegative tables, algebraic statistics suggests an approach based on the notion 
of Markov moves and Markov bases. A Markov move is any table m with integer entries 
that preserves the linear function T, i.e. T(n ± m) = T(n) for all n G Tt- 

A finite set of moves Ai = {m 1; . . . , m r } is called a Markov basis if it is possible to 
connect any two tables of Tt with moves in Ai. More formally, for all \\\ and n 2 in Tt, 
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there exist a sequence of moves {m^, . . . , m iA } and a sequence of signs {e^, . . . , €i A } such 
that 

A 

n 2 = ni + ^e ia m ia (4) 

a=l 

and 

a 

ni + 2^ e i 3 m i 3 ^ for all a = 1, . . . , A . (5) 

3=1 

See Diaconis and Sturmfelsl dl998h for further details on Markov bases. Given a Markov 
basis, the Diaconis-Sturmfels algorithm for sampling from a distribution a on Tt starts 
from a table n £ J-V and proceeds at each step as follows: 

• Choose a move m £ Ai and a sign e = ±1 with probability 1/2 each independently 
on m; 

• Generate a random number u from the uniform distribution U[0, 1]; 

• If n + em £ J-y and min{cr(n + em)/cx(n), 1} > u, then the Markov chain moves 
from the current table n to n + em; otherwise, it stays at n. 

To actually compute Markov bases, we associate to the problem two distinct polyno- 
mial rings. First, we define R[x] = M[xi, . . . ,Xk], i.e., we associate an indeterminate Xh 
to any cell of the table; then, we define K[y] = • • • ,y s ], with an indeterminate ^ 
for any component of the linear function T. In the followin g we will use some facts from 
commutative algebra, to be found in, e.g., Cox et all ( 1992 ). 

The simplest method to compute Markov bases uses the elimination algorithm: 

• For each column of the matrix At, define the polynomial 

s 

f h = x h -l[yt Tii ' h) for h = l,...,k; (6) 

Then, consider the ideal generated by the polynomials /i, . . . , /*•: 

X=</i,...,A) (7) 

in the polynomial ring IR[x, y]; 

• Eliminate the y's indeterminates, and obtain the ideal 

X At =Elim(y,X) (8) 

in the polynomial ring R[x]. The ideal Xa t in Equation (jHJ) is by definition the toric 
ideal associated to At] 
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• A Grobner basis of Ia t is formed by binomials. Each binomial defines a move of 
a Markov basis taking the exponents. Namely, the correspondence between the 
binomials and the moves is given by the log-transformation 

log(x a - x fe ) = a - b G R k . (9) 



Although faster algorithms have been implemented to compute toric ideals, the elimination- 
based algorithm is the simplest one and we will use this tec hnique in some of th e proofs. 
For details on computati onal methods fo r toric ideals, see iBigatti et al.l (119991 ) and the 
implementation in 4ti2 (|4ti2 team! 120081) . 



As noted in e.g. iRapallo and Rogantinl (120071 ) and lChen et al.l (120051 ). when the entries 
of table have an upper bound, the classical notion of Markov basis is not sufficient to 
connect all the tables in a fiber. In fact, the fiber in the bounded case: 



7% = {n | n G N fc , T(n) = T(n obs ) , n < b} 



(10) 



is in general smaller than the unrestricted one. 



As shown in Sections [3] and H] as well as lRapallo and Rogantinl (120071 ). the constraint 
n < b translates into a linear system by introducing dummy counts n-i, . . . ,nk with 
n-h + n h = b h for all h — 1, . . . , k. Therefore, in the presence of upper bounds of the cell 
counts, the Markov basis must be computed through a Universal Grobner basis of the 
ideal Ia t ■ 

The procedu re to compute a U niversal Grobner basis of the ideal Xa t is fully described 
in Chapter 7 of ISturmfelsl (119961 ) . Here we summarize the main steps of the algorithm. 
Given the matrix At, its Lawrence lifting is a matrix A(At) with dimensions (s+k) x (2k) 
and with block representation 



A T 



(11) 



where is a null matrix with dimensions sxk and If. is the identity matrix with dimension 
k x k. 

The Universal Grobner basis of At is then computed with the algorithm below: 



Define k new indeterminates x\, 



• Compute a Grobner basis of the toric ideal 1ma t ) in the polynomial ring R[x, x], 
the toric ideal associated to the Lawrence lifting A(At) of At] 

• Substitute Xh — 1 for all h — 1, . . . , k. 

T he interest e d rea der can find all details and the proof of the correctness of this algorithm 



in 



Sturmfelsl ( 119961 ). Chapter 7. In terms of Markov bases, we state the following definition. 



Definition 1. A Markov basis computed through a Universal Grobner basis is a Universal 
Markov basis. 
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Recall that a Universal Grobner basis of the toric ideal Xa t is formed by binomials, 
while the corresponding Universal Markov basis is formed by moves, that is tables with 
integer entries. A Grobner basis is a polynomial object, while a Markov basis is a combi- 
natorial object. As mentioned above, the connection between Grobner and Markov bases 
is given in Equation OH]). 

The following section is devoted to the computation of Universal Markov bases in 
special settings, such as incomplete tables, bounds acting on a subset of the full sample 
space, or strictly positive bounds. 



3 Universal Markov bases and incomplete tables 

The computation of Universal Markov bases is not easy in practice, especially for two 
distinct circumstances: 

• The computation of a Universal Markov basis is based on twice the number of 
indeterminates than the standard Markov basis; 

• The number of moves of a Universal Markov basis increases quickly with the dimen- 
sion of the contingency table. 

Example 1. Let us consider I x J contingency tables under independence model. With 
fixed margin al totals, and without upper bo unds, a Grobner basis is formed by all 2 x 2 



minors (see \Diaconis and Sturmfeli ((1993) ). This fact can be proved theoretically and 
does not need symbolic computations. 

In this special case we are also able to char acterize the Unive rsal Grobner basis. Com- 
bining Algorithm 7.2 and Corollary 14-12 in Sturmfels ( 199d) , the Universal Grobner 



basis is formed by all the binomials: 

nn . . rf . . i*V* . . rf . . rf . . rf . . (1x1 

• // «Ul X «2j2 • • • ^Isjs • // «2J1 X «3J2 • • • X «ljs ) l 1Z 7 

where (ji, 1%), ■ ■ ■ , (j s , ii) is a circuit in the complete bipartite graph with I and J 

vertices. 

This implies that the number of moves needed for the Universal Markov basis increases 
much faster with respect to the Markov basis for the unbounded problem. Just to give the 
idea of such increase, we present in the following table the number of moves of the Grobner 
bases for square I x I tables for the first I 's. 





2 


3 


4 


5 


6 


7 


Standard Markov basis 


1 


9 


36 


100 


225 


441 


Universal Markov basis 


1 


15 


204 


3,940 


113,865 


4,027,161 



To overcome this difficulty it is of major interest to have some results for the theoretical 
computation of Universal Markov bases. The first result in this direction that we present 
in this section is related to tables with structural zeros (or incomplete tables). 

Let X C X be the set of structural zeros of the table, let T' be the function T 
restricted to X' — X \ X and let T' A be the toric ideal associated with A T i 
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Theorem 1. Let n be a contingency table and let J 7 ^ be its bounded fiber under the bound 
n < b. Let Xq be the set of structural zeros. Then a Universal Grobner basis for the 
ideal X' A is obtained from the Universal Grobner basis ofX Ar by removing the binomials 
involving indeterminates in X . 



Proof. Using Theorem 7.1 in ISturmfelsl (119961 ). the Universal Grobner basis has the fol- 
lowing two properties: (a) it is unique; (b) it is a Grobner basis with respect to all term 
order ings on R[x]. 

Without loss of generality, let us suppose that the structural zeros are the first cells, 
i.e., Xq = {1, . . . , k'}. The unique Universal Grobner basis is, from property (6) above, a 
basis with respect to th e elimination t erm ordering for the first k' indeterminates. Then, 
we apply Theorem 4 in iRapallol (120061 ) and the elimination algorithm. 

Following the scheme in Equations (jSJ) through (0) with the matrix A(At), we define 
the polynomials 

s 

fh = x h -y h Y[ V £ for h = l,...,k 

i=i 

and 

f k+h = x h -y h for h = l,...,k. 
The ideal in Equation (J7|) becomes 

% = (fly ■ ■ ■ ) fk, fk+1, • • • > f2k) 

in the polynomial ring IR[x,x, y,y]. Therefore, the toric ideal Xa(a t ) as in Equation (jSJ) is 

X a(At) =Elim({y,y},X). (13) 
When xi, . . . ,Xk> are indeterminates associated to structural zeros, the relevant ideal is 

X' = Elim({xi, . . .,x k '},X) 
and the Universal Grobner basis of X' A is computed through 

Elim({y,y},X') = Elim({y,y},Elim({x 1 ,...,a; fc ,},X)) = 

= Elim({xi, . . . ,x fc '},Elim({y,y},Z)) = Elim({xi, . . . , x v },X^ At) ) 

and then substituting x^ — 1 for all h. As the Universal Grobner basis is in particular 
a basis with respect to the elimination term ordering for the indeterminates xi, . . . ,Xk>, 
this proves that to remove the binomials involving xi, . . . , from Xa(a t ) is equivalent to 
compute the Universal Grobner basis for the incomplete table. □ 

If one has the Universal Markov basis for the complete configuration, Theorem [T] ap- 
plies easily. In fact, using the correspondence between moves and binomials, the theorem 
above is clearly equivalent to the following: 
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Corollary 1. Let n be a contingency table and let be its bounded fiber under the bound 
n < b. Let X be the set of structural zeros. Then a Universal Markov basis for T^, is 
obtained from a Universal Markov basis for by removing the moves involving the cells 
in Xq. 

Example 2. Let us consider 4x4 contingency tables with fixed marginal totals, as in 
Example U\ Without structural zeros, the Universal Markov basis is formed by 204 bino- 
mials: 36 moves involving 4 cells: 96 moves involving 6 cells: and 72 moves involving 8 
cells. 

Suppose that the cell (1,1) is a structural zero. This kind of table is depicted below, 
where means a structural zero, while the symbol • denotes a non-zero cell. 



• • 


• • 


• • 

\ 9 • 


• • 



From the complete Universal Markov basis we can remove all moves involving the struc- 
tural zero. Applying Corollary^ we remove: 9 moves involving 4 cells: 36 moves involving 
6 cells: and 36 moves involving 8 cells. The Universal Markov basis in this case has 123 
moves. 

Suppose now that the whole main diagonal contains structural zeros, as in the figure 
below. 

/0 • • «\ 
» • « 
> • i 

\. . . oy 

In this situation we remove: 30 moves involving 4 cells: 80 moves involving 6 cells: and 
66 moves involving 8 cells. Finally, the Universal Markov basis has only 28 moves. 

The last example is a prototype for the quasi-independence models. N ow consider I x J 
contin gency tables with structural zeros under quasi-independence model. lAoki and Takemura 
( 120051 ) computed a unique minimum Markov basis for I x J contingency tables with struc- 
tural zeros under quasi-independence model. 

Definition 2 (lAoki and Takemural (l20Q5h ). Let X = | 1 < i < 1, 1 < j < J} be 

the sample space and let X' = X\Xq be the set of cells that are not structural zeros. Also 
let 



Fo(S) 



m 



E 

j'=i 



i 



o. 



and rriij 



Ofor {i,j)£X 



A loop (or loop move) of degree r on X' is an I x J integer array M r (ii, . . 
F (S), for 1 < zi, . . . ,i r < I, 1 < ji, . . . ,j r < J, where M r (i lt . . . , v; ji, 
elements 



-,Jr) e 
has the 



m 
in 



HJl 

hj2 



m. 
m. 



i-233 



m. 



lr-ljr-l 



m. 



Irjr 



m i r - 1 j r — m i r jl 



I- 
1. 
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and all other elements are zero. Also the level indices ii,i 2 ,..., and ji, j 2 , - • • are all 
distinct, i.e. 

i m ^ i n and j m ^ j n for all m^n. 

Specifically, a degree 2 loop M 2 (zi, i<i\ ji, 32) is called a basic move. 

The support of a loop M r (ii, . . . ,i r ;ji, ...,j r ) is the set of its non-zero cells. A 
loop M r (ii, . . . ,i r ',ji, ■■■,jr) is called df 1 if R(i±, . . . ,i r ',ji, ■ ■ ■ ,jr) does not contain sup- 
port of any loop on S of degree 2, ... ,r — 1, where R(h, . . . ,i r ',ji, ■ ■ ■ >Jr) = {(hj)\i £ 

{k, ■ ■ -,i r },j e {ji, ■ ■ ■ ,j r }}- 



Corollary 2 ( Aoki and Takemura ( 20051 )). The set of dfl loops of degree 2, . . . , min{J, J} 



constitutes a unique minimal Markov basis for I x J contingency tables with structural 
zeros under quasi-independence model. 

The examples above show that in many cases the computation of Universal Markov 
bases for incomplete tables inherits benefit from complete tables. In terms of compu- 
tations, an incomplete table has less cells than the corresponding complete table and 
therefore an incomplete table implies the use of a smaller number of indeterminates. 
Nevertheless, in a complete table with symmetric constraints the Markov bases can be 
characterized theoretically (e.g., independence model presented here), and in many cases 
the symmetry of the combinatorial probl em can lead to substantial simplifications in the 
symbolic computation (see in particular ( Aoki and Takemural . 2008 )). Moreover, follow- 



ing Theorem [H in the computation of Universal Markov bases through elimination we do 
not introduce new polynomials and, there fore, we d o not increase the degree of the moves, 
as usual in the unbounded problems (see Rapallo ( 20061 )). 



Example 3. As a different example, where Markov bases are much simpler, we present 
a computation for a 2 3 " 1 fraction of a factorial design. The use of Markov bases for 
fractions are useful for experiments with Poisson- distributed respo nse variable and the 



upper bounds are needed when the response variable is Binomial (see lAoki and Takemura 



Here we consider the lattice { — 1, l} 3 for an experiment with 3 factors A, B, and 



C. The fraction defined by the aliasing equation AB = 1 consists of 4 cells: 

(-1,-1,-1), (-1,-1,1), (1,1,-1), (1,1,1). (14) 
These 4 point s can be viewed as an incomplete three-way table. Computing with C0C0A 



iCoCoATeam . \2001i) . the standard Markov basis for this incomplete table under the com- 
plete independence model (i.e., with the one-way marginal totals fixed), we obtain only 
one move, represented by the binomial: 

x- 1 -x- 1 x 1 xi (15) 

From this computation we note that: 

• In this example the standard Markov basis has only one polynomial and therefore it 
is by definition a Universal Markov basis; 
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• The standard Markov basis for the corresponding complete table with 8 cells is formed 
by 9 quadratic square-free binomials, and the corresponding Universal Markov basis 
for the bounded problem has 20 binomials: 

-x[-l,l,-l]x[l,-l,l] + x[-l,-l,-l]x[l,l,l], 
-x[-l,l,l]x[l,-l,-l] + x[-l,-l,-l]x[l,l,l], 
-x[-l,l,-l]x[l,-l,-l] + x[-l,-l,-l]x[l,l,-l] , 
x[-l,l,l]x[l,l,-l] - x[-l,l,-l]x[l,l,l] , 
-x[-l,-l,l]x[-l,l,-l] + x[-l,-l,-l]x[-l,l,l] , 
-x[-l,l,l]x[l,-l,-l] + x[-l,-l,l]x[l,l,-l] , 
x[-l,l,l]x[l,-l,-l] - x[-l,l,-l]x[l,-l,l] , 
x[-l,-l,l]x[l,-l,-l] - x[-l,-l,-l]x[l,-l,l] , 
-x[l,-l,l]x[l,l,-l] + x[l,-l,-l]x[l,l,l] , 
-x[-l,l,l]x[l,-l,l] + x[-l,-l,l]x[l,l,l] , 
-x[-l,l,-l]x[l,-l,l] + x[-l,-l,l]x[l,l,-l] , 
-x[-l,-l,l]x[l,l,-l] + x[-l,-l,-l]x[l,l,l] , 
-x[-l,l,l]x[l,-l,l]x[l,l,-l] + x[-l,-l,-l]x[l,l,l]^2, 
-x[-l,-l,l]x[-l,l,-l]x[l,-l,-l] + x[-l,-l,-l]~2x[l,l,l] , 
x[-l,-l,l]x[l,l,-ir2 - x[-l,l,-l]x[l,-l,-l]x[l,l,l] , 
x[-l,l,l]x[l,-l,-l]~2 - x[-l,-l,-l]x[l,-l,l]x[l,l,-l] , 
-x[-l,-l,-l]x[-l,l,l]x[l,-l,l] + x[-l,-l,l]^2x[l,l,-l] , 
x[-l,l,l]~2x[l,-l,-l] - x[-l,-l,l]x[-l,l,-l]x[l,l,l] , 
-x[-l,l,-ir2x[l,-l,l] + x[-l,-l,-l]x[-l,l,l]x[l,l,-l] , 
-x[-l,l,-l]x[l,-l,l]~2 + x[-l,-l,l]x[l,-l,-l]x[l,l,l] 

Notice that in a Metropolis-Hastings algorithm one can also make use of the complete 
Markov basis and then discard the chosen move at a given step if it modifies a cell with a 
structural zero. But the computations for this example show that the use of such a strategy 
leads to a slower convergence of the Markov chain to the stationary distribution. The use 
of the Markov basis with the unique applicable move is essential for a correct use of the 
Metropolis- Hastings algorithm. 

4 Markov bases for partially bounded tables 

While the problem in the previous section has a positive answer, in this section we present 
a problem without a theoretical solution. Nevertheless, we show how to write the relevant 
symbolic computations and we describe explicitly some special examples. 

When working with bounded contingency tables, it is a common situation to have 
some cell counts bounded and other counts unbounded. Moreover, some bounds can be 
treated as unessential. In this section, we consider two-way contingency tables under 
independence model. 
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It is well known that under the marginal totals each cell count can not exceed 
min{rij + , n + j}, where rii + is the 2-th row total and n + j is the j'-th column total. Thus, 
any bound exceeding such value can be ignored. Now, we know that: 

• With no upper bounds, we need a Markov basis formed by the basic moves of the 



• With an upper bound for each cell count, we need the Universal Markov basis formed 
by all the closed circuits in the complete bipartite graph with I and J vertices, as 
discussed in the previous section. 

Example [TJ shows that the differences between such two situations are noticeable in 
terms of number of moves. We can conjecture that with some cells bounded and other 
cells without bounds we will fall into an intermediate situation, with a Grobner basis 
formed by all the degree two by two minors and some other square-free binomials. 

As pointed out in the previous section, the bounds on the cell counts are represented 
as linear constraints through the two identity matrices Ik in the Lawrence lifting A(At), 
see Equation ( ITT]) . Thus, for the computation of Markov bases for partially bounded 
table, we have to remove from the block [Ik, Ik] of A(Ap) the rows corresponding to cells 
without upper bound. 

To show the behavior of Universal Markov bases with partial bounds, we present here 
some numerical examples of Markov bases computed with CoCoA. 

Example 4. Consider a 3 x 3 contingency table under independence model. With a bound 
on all the cells, the Universal Markov basis has 15 moves: 9 moves of the form ^"^j 
for all 2 x 2 minors of the table plus the 6 moves of degree 3 below: 
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m 6 








Now we have computed the Universal Markov basis in three different situations, with 
different types of bounds: 

• with a bound only on the cell (1, 1), the Universal Markov basis has 10 moves: the 
9 basic moves and m 2 ; 

• with a bound on the three cells on the main diagonal, the Universal Markov basis 
has 13 moves: the 9 basic moves, plus mi, m2, rri4 and mg; 

• with a bound on the five block-diagonal cells: (1,1), (2,2), (2,3), (3,2) and (3,3) 7 
the Universal Markov basis has 12 moves: the 9 basic moves, plus m 1; m 2 and m 4 ; 

• with a bound on all cells but the (1,1), the Universal Markov basis has 13 moves: 
the 9 basic moves, plus m 3 , m 4 , m 5 and m 6 . 

Example 5 ( Aoki and Takemura ( 20051 )). Consider 6x6 contingency tables of the fol- 
lowing form: 

/o • • «\ 

• • • 

• • • 
• • • 

• • • 
\0 m • • 0/ 

The reduced Grobner basis with the degree reverse lexicographical ordering consists of three 
basic moves, 20 degree 3 loops, 10 degree 4 loops, and 3 degree 5 loops. Note that the loops 
of degree 4 o,nd 5 are not df 1. On the other hand, all the 20 loops of degree 3 are df 1. 
Hence by Corollary [H, the above three basic moves and 20 degree 3 loops constitute the 
unique minimal Markov basis. 



5 Markov subbases for bounded and incomplete two- 
way contingency tables 

Despite the computational advances presented in the previous sections, there are applied 
problems where one may never be able to compute a Markov basis. Models of no-3- 
way interaction and constraint matrices of Lawrence type seem to be arbitrarily difficult, 
namely if we vary / and J for (/, J, A')— tables, the degree and support o f elements in 



a minimal Markov bases can be arbitrarily large (IDe Loera and Onnl . 120051 ). In general, 
the number of elements in a minimal Markov basis for a model can be exponentially 
many. Thus, it is important to compute a reduced number of moves which connect all 
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tables instead of computing a Markov basis. IChen et al.l ( 120101 ) discussed that in some 
cases, such as logistic regression, positive margins are shown to allow a set of Markov 
connecting moves that are mu ch simpler than the full Markov basis. One such example 
is shown in lHara et al.l (120081 ) where a Markov basis for a multiple logistic regression is 
computed by the Lawrence lifting of this basis. In the case of bivariate logistic regression, 
Hara et al.l ( 120081 ) showed a simple subset of the Markov basis which connects all fibers 
with a positive sample size for each comb ination of levels of covariates. Such connecting 
sets were formalized in IChen et al.l ( 120061 ) with the terminology Markov subbasis. 

In this section we use a sample space indexed as {1, . . . , k} instead of {1, . . . , /} x 
{1, . . . J} whenever possible, in order to make the formulae easier to read. 



Definition 3 ( Chen et al. d2006h ). A Markov subbasis M.A T ,n ohB f or n obs £ N fc and integer 
matrix At is a finite subset o/ker(Ar) D 1 k such that, for each pair of vectors u, v G Tt, 
there is a sequence of vectors nij G Ma t ti hB , i — 1> • • • > h such that 



u 



i=i 



m, 



< v + ^nii, j = 1,...,Z. 



The connectivity through nonnegative lattice points only is required to hold for this specific 

Hobs- 



Note that M.A T ,n ohs for every n t, s G N fc and for a given At is a Markov basis M.a t for 



An 



In this section we first study Markov subbases M b A T ^ ohs for any bounded two-way 
contingency tables n obs G N fc with positive bounds, i.e., no structural zeros, under inde- 
pendence model. Then we study Markov subbases M b Aj , n b for any incomplete I x J 
contingency tables n Q b s 6 N fc with positive margins, i.e., Ar(n b s ) > 0, under indepen- 
dence model. 

To analyze these cases we recall some definitions from commutative algebra: 

• An ideal I C R[x] is radical if 

{/ G K[x] | /" G X for some n} = X; 

• Let X, J C M[x] be ideals. The quotient ideal (X : J) is defined by: 

(l:J) = {fe K[x] | / • J C X} ; 

• Let Z = {z±, . . . , z s } C M fc . A lattice L generated by Z is defined: 



L = ZZ. 
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M C M. k is called a lattice basis of L if each element in L can be written as a 
linear integer combination of elements in M. Now a lattice basis for ker(Ar) has 
the property that any two tables can be connected by its vector increm ents if one 
i s allo wed to swing negative in the connecting path (see Chapter 12 of ISturmfels 
(119961 ) for definitions and properties of a lattice basis). 



The reader can find in lCox et al.l ( 119921 ) more details on the definitions above. 

Theorem 2 ( Chen et al. ( 2010l )). Suppose Xm is a radical ideal, and suppose M is a 
lattice basis. Let p = x\ - ■ -Xk- For each index I with (At)e > 0, let X? = (xh)(A T ) e:h >o 

be 

the monomial ideal generated by indeterminates for cells that contribute to margin i. Let 
C be the collection of indices i with (Arn b s )^ > 0. Define 



Xr 



X, 



Al 



tec 



If 



(X c : (X c : p)) = (1) 
then the moves in M connect all the tables in Tt- 



(16) 



For computing the following examples we have used the software Singular (IGreuel et al. . 



2009|). 



Example 6 (Continue from Example H]). Consider again 3x3 tables with fixed row and 
column sums, which are the constraints from fixing sufficient statistics in independence 
model, and with all bounded cells. This is equivalent with 3x3x2 tables with constraints 
\A,C], [B,C], [A,B] for factors A, B, C, which would arise for example in case-control 
data with two factors A and B at three levels each. 

The constraint matrix that fixes row and column sums in a 3 x 3 table gives a toric 
ideal with a (jj) x (^) dement Grobner basis. Each of these moves can be paired with 
its signed opposite to get 9 moves of 3 x 3 x 2 tables that preserve sufficient statistics. 

This is equivalent to 9 moves of the form (~^~ J for all 2 x 2 minors of the table 



s -1 +1, 

for 3x3 tables under independence model (see Example^. These elements make an 
ideal with a Grobner basis th at is squar e - free in the initial terms, and hence the ideal 
is radical (Proposition 5.3 of \Sturmfels\ 12002) ). Then applying Theorem^ with nine 
margins of case- control counts, i.e., this is equivalent to having the positive constraints on 
bounds, namely we have non-zero bounds for all cells, shows that these 9 moves do connect 
tables with positive case-control sums. The full Markov basis has 15 moves. Therefore, 
the Markov subbasis for this table is the standard Markov basis for a 3 x 3 table under 
independence model. 

Example 7 (IChen et al. Consider now 4x4 tables with fixed row and column 

sums as in Example 0, and with all bounded cells. Again, this is equivalent with 4x4x2 
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tables with constraints [A,C], [B,C], [A,B] for factors A, B and C, with factors A and 
B at four levels each. 

The constraint matrix that fixes row and column sums in a 4 x 4 table gives a toric 
ideal with a ( 2 ) x ( 2 ) element Grobner basis. Each of these moves can be paired with its 
signed opposite to get 36 moves of 4 x 4 x 2 tables that preserve sufficient statistics: 



( o 



V 



o 











o \ 




0/ 



/ o 



V 

















0/ 



These elements make an ideal with a Grobner bas is that is s q uare- free in the initial terms, 
and hence the ideal is radical (Proposition 5.3 of lSturmfela 1(2002 )). Then applying The- 
orem\B with sixteen margins of case-control counts, i.e., this is equivalent to having the 
positive conditions on bounds, namely we have non-zero bounds for all cells, shows that 
these 36 moves do connect tables with positive case-control sums. The full Markov basis 
has 204 moves. Therefore, the Markov subbasis for this table is the standard Markov basis 
for a 4 x 4 table with fixed row and column sums fixed without bounds. 

In practice, the algorithm in Theorem [2] is not feasible for a large number of cells in a 
table. 

From Examples [6] and [7] it seems that for bounded two-way tables with row and 
column sums fixed we only need a standard Markov basis for two-way tables with row 
and column sums fixed if these bounds are positive. In fact, by the following theorem, 
additional elements in a Universal Markov basis are needed for incomplete tables, i.e., 
structural zeros. 

Theorem 3. Consider I x J tables with row and column sums fixed and with all cells 
bounded. If these bounds are positive, then a Markov subbasis for the tables is the standard 
Markov basis for I x J tables with row and column sums fixed without bounds, i.e., the 
set of basic moves of all 2 x 2 minors. 

In order to prove Theorem [3] we need the following proposition. 

Proposition 1. Let Th = (xh, Xh) for h = 1, . . . , k = IJ. Then we have: 

k 

JX h = ( Zl ---z k I Zj = Xj or x j forj = l,...,k). 

h=l 

Proof. One can prove this propositi on by induction on k. For k = 2, one can verify that 



using Singular (IGreuel et al.L 120091 ) . Assume ri/i=i^ = ( z i ' 



1, . . . , k) holds. We want to prove that rih=i^ 
1, . . . ,k + 1). We have: 

Uhtllh = (ilLi 2 *) • (x k +i,x k+ i) 



\%l " • " z k+l 



Zk I Zj = Xj or Xj for j 
j = Xj or Xj for j 



(z l ---z k I Zj = Xj or Xj for j = 1, . . . , k) ■ (x k+1 ,x k+1 ) 



\Z\ • ■ ■ Z k+ i 



Xj or Xj for j = 1, . . . , k + 1) 
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□ 



Let M be the set of vectors such that 

M — {± (e^jj + ej 2 j 2 — ej li?:2 — ej 2jl )}, 

where = is defined as an integral array with 1 at the cell 1) and —1 at the 
cell 2) and every other cells. Also let 

Xm = \ :lc i 1 j 1 Xi 2 j 2 Xi 1 j 2 Xi 2 j 1 — ^i 1 j 2 Xi 2 j 1 Xi 1 j l Xi 2 j 2 I i\ 7^ %2i jl 7^ J2) ■ \\^J 

Proof of Theorem^ Consider the ideal Xm in Equation (117jl . Its Grobner basis is square- 
free in the initial terms, and hence the ideal is radical (Proposition 5.3 of ISturmfels 
( 120021 ) ) . Since Xm in Equation f fT7|) is radical, we use Theorem [2j Let Xa be the toric 
ideal associated with the constraint matrix of the tables I x J x2 with constraints [A, C], 
[B, C], [A, B] for factors A, B, and C. We want to show 



X, 



M 



n 



x, 



X A 



I- /, 3 1.-/ 



where 2^ = (x^-, x^) for 2 = 1, . . . , J, j = 1, . . . , J. Clearly (l A/ : rL=i,.../, j=i...j X ij) c 
X^. Thus we want to show Xa C (x m : Hi=i / j=i j^-ij) ■ 



By Proposition [T], and Equation (5) on page 193 in (ICox et all Il992l ). we only have 
to show 

X A C (X M : zn ■ ■ ■ z u ) 



where Zij = Xij or Xy for i = 1 



1,...,J. 



• , / and j 

Let / G Z^. Then by the definition of the quotient ideal, we only have to show 



(211 • • • zu) - /el 



Assume I < J without loss of generality. Also if / < J, we can reduce all moves written in 
the form of ( fl2|) to / x I x 2 tables and other columns are zeros. Thus we consider 1x1x2 
tables. We will prove t his by induction on I . For / = 3, one can verify that the statement 
holds using Singular ( jGreuel et al.l . 120091 ). Assume that the statement holds for some 
I — 1 > 3. We want to show the statement holds for I. By the inductive assumption 
we can assume that s = I in Equation (Ti"2"|) . Let / = x^x^ ■ ■ ■ Xi I j I x~i 2 j 1 Xi3j2 ' ' '^113, — 
x nh x ^ih ' ' ' x hji~ x ~iiji x ~i2j2 ' ' ' x nh- By the symmetry on the row and column operations on 
the table 1x1x2, without loss of generality we assume / = Xnx 2 2 ■ ■ ■ ^7/^21X32 • • - Xij — 
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^21^32 ■ ■ • x\iX\\Xi2 ■ ■ ■ %n- This is a binomial representation of a move on / x / x 2 tables 



/ 1 
-1 1 



1 . 
1 . 



-1 \ 




-1 1 
-1 1 



/ -1 ... 1 \ 
1 -1 ... 



\ 




/ 



V 








1 -1 
1 -1 



... 1 
y ... 1 J 



/ ... 1 \ 



7 



1 







... 1 
\ ... 1 J 



( ... 1 \ 



1 







... 1 
\ ... 1 J 



( 1 ... \ 



1 







... 1 
\ ... 1 J 



where the first I x I table is the first level of the table and the second table is the second 
level. We claim that 



(z u ■ ■ ■ z H ) ■ f 



(i,i)=(l ) 2)...,(/-l,/) 



(18) 



where 



/2 1 
1 2 



U(i,j) = < 



1 \ 



/ 1 



1 ... 
... 



w if i = l,j = 2 



1 1 1 ... 2 / 

v 7 \ ... / 

,j')=(i,2),...,(i-i,j-i)U{i' , j') + (eij-i + ej_ M _i) - (ei,i + e jd ) else 



and 



V(i,j) = < 



' ( ... 1 \ 
1 ... 
1 ... 



/ ... 1 \ 
... 

v° ••• 1 °y 



w if % = I — 1 , j = I 



V o o ... i o / 

^S^ ii / )=( j + i ii+ i )v .. i(/ _ li7 - ) \/(z / , f) + (ei )i+ i + e j+hj+ i) - (e hj + e jsi ) else 



where w £ {0, 1} such that 



w 



1 If %ij Xij 

else. 
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By the construction of each coefficient, each monomial in each term cancels out except 
the monomial with a negative sign in the first term of the sum and the monomial with a 
positive sign in the last term of the sum. Also simple calculations show that 



/ 




and 



and 



u 2 



/0 ... 

... 

V° - : 

/o 1 

1 









and 



v 2 







Then we notice that 



1 o\ 


1 



+ U(I-1,I) 



(2 1 
1 2 



V 



1 1 



o i\ 





1 



+ V(I-1,I) 



( 
1 
1 



7 



. \ 
. 
. 







[7(1, 2) 



7 



/ 1 ... \ 
1 ... 
... 



j 



1 \ 



w 



1\ 





+ w 



v° 


. 


. . 1 


o t 


1 


/ 1 


1 . 


. 1 


2 > 




2 


1 . 


. 1 


1 




1 


2 . 


. 1 


1 


— w 


V 1 


1 . 


. 2 


1 J 




/ 1 


.. 


. 


o\ 







1 . . 


. 










.. 


. 















+ it?. 





.. 


. 1 







v° 


.. 


. 


l ) 





/ 1 1 ... 1 1 1 \ 
1 1 ... 1 1 1 



1 1 
1 1 



1 1 1 
1 1 1 



7 i o ... o o o N 



— w 



1 












1 
1 



/ 
1 



/ 




\ 




... 
... J 

1 \ 





+ w 



+ 








1 
1 



M («i) 
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and 



/ 1 1 
1 1 



1 1 1 \ 
1 1 1 



V 



w 



1 1 ... 1 1 1 
1 1 ... 1 1 1 

7 o o ... o o l V / i o 

1 ... 1 



/ 



/ 




\ 




LV 




\0 



1 
1 ) 



... 
... 

\ 





w 




\0 



1 
1/ 



M (V2) 



Thus, 



x «2 X f2 e q Ua i s to the left hand side in Equation 



□ 



Now we assume that the given margins are positive for bounded I x J tables, i.e., we 
assume that all row and column sums are positive. Without loss of generality, we can 
assume that all margins are positive because cell counts in rows and/or columns with 
zero marginals are necessary zeros and such rows and/or columns can be ignored in the 
conditional analysis. 

Let X — | 1 <«'</, 1 < J < J} and let Xq be a non-trivial subset of X. 

Recall that X is the set of structural zeros of the table. For Examples M and EH we used 
Theorem |2J 

Example 8. We consider 3x3 tables under independence model with all cells bounded. 
We assume row and column sums are positive. We have studied in which X the standard 

Markov basis for 3x3 tables, i.e., the set of the 9 moves of the form for all 

2x2 minors of the table, connects these bounded tables with positive conditions. If\Xo\ = 1 
or | X \ = 2 then Equation in (jlfip holds. Thus, these 9 moves connect bounded tables. 
For \Xq\ = 3, if X = {(1, 1), (2, 2), (3, 3)} after an appropriate interchange of rows and 
columns, i.e. there are 6 patterns of Xq, then Equation in f fT6|) does not hold. Otherwise 
for other patterns of Xq, Equation in f|T6|) holds. Thus, 9 moves connect bounded tables. 
For \ X \ > 3, if Xq contains {(1, 1), (2, 2), (3, 3)} after appropriate interchange of rows and 
columns, then Equation in f|T6|) does not hold. Otherwise for other patterns of S, Equation 
in ffTB 7 ]) holds. Thus, these 9 moves connect bounded tables. Even with the positive margin 
assumption, if Xq = {(1, 1), (2,2), (3,3)}, then the basic moves do not connect incomplete 
contingency tables, i.e., we need the Universal Markov basis. 

Example 9. We also consider 4x4 tables under independence model with all cells 
bounded. We assume row and column sums are positive. After an appropriate inter- 
change of rows and columns, if we have structural zero constraints on all diagonal cells 
(i.e., cells with indices in Xq = {(i,j) : i = j for i = 1, . . ., I}), then Equation in ( |T6|) 
does not hold. 
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Now we consider I x J contingency tables with only diagonal eleme nts being structural 
zeros u nder assumption of positive conditions on row and column sums. lAoki and Takemura 
(120051 ) showed the following propositions. 



Proposition 2. Suppose we have I x J tables with fixed row and column sums. A set of 
basic moves is a Markov subbasis for I x J contingency tables, I, J > 4, with structural 
zeros in only diagonal elements under the assumption of positive marginals. 

From Examples [HI [U and Proposition [21 we have the following open problem. 

Problem 1. Suppose we have I x J tables with fixed row and column sums. What is the 
necessary and sufficient condition on X so that a set of basic moves is a Markov subbasis 
for I x J contingency tables with structural zeros in X§ under the assumption of positive 
marginals. 

6 Discussions 

In this paper we have studied Markov bases and Markov subbases for bounded contingency 
tables, showing many ways to compute them. While Theorem [1] applies to incomplete 
tables, Theorem [3] considers bounded tables with positive bounds. In particular, Theorem 
[3] shows that considering two-way tables under independence model for bounded tables 
with strictly positive bounds, then the set of basic moves, which is much smaller than the 
Universal Markov basis, connects the fibers with given margins. Thus, in practice we do 
not need to compute the Universal Markov basis. 

In order to prove Problem [1] we may be able to apply Theorem [2] and mimic the proof 
for Theorem [31 If we can solve Problem [T] this would be very useful in practice because 
we know exactly when we only need the set of basic moves of all 2 x 2 minors for two-way 
incomplete contingency tables. 
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